-
Notifications
You must be signed in to change notification settings - Fork 19
feat: Adding support for shared-requestor #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4b4c42c
to
e6c5941
Compare
docs/automatic-ofed-upgrade.md
Outdated
@@ -108,6 +108,21 @@ controller manager watchers: | |||
builder.WithPredicates(requestor.NewConditionChangedPredicate(setupLog, | |||
requestorOpts.MaintenanceOPRequestorID))). | |||
``` | |||
|
|||
The requestor mode supports a `joint-requestor` flow where multiple operators can coordinate node maintenance operations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any chance to modify the terminology to: shared-requestor
or shared-node-maintenenace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider putting this in a sub-section of requestor
Assumptions: | ||
1. Cluster admin, which requires requestor joint mode, needs to make sure that all operators, utilizing maintenance OP, use same upgrade policy specs (same drainSpec). | ||
2. To be able to accommodate both GPU/Network drivers upgrade, `DrainSpec.PodSelector` should be set accordingly (hard-coded). | ||
* podSelector: `nvidia.com/ofed-driver-upgrade-drain.skip!=true,nvidia.com/gpu-driver-upgrade-drain.skip!=true` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just making sure, drain pkg will treat this as an OR right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be treated as AND condition.
Check the following example (I have tested it)
1. pod "foo" - nvidia.com/ofed-driver-upgrade-drain.skip=true label
2. pod "bar" - nvidia.com/gpu-driver-upgrade-drain.skip=true label
3. pod "baz" - no label
Pod "foo" (has nvidia.com/ofed-driver-upgrade-drain.skip=true):
nvidia.com/ofed-driver-upgrade-drain.skip!=true → FALSE (because it equals true)
nvidia.com/gpu-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
Result: FALSE AND TRUE = FALSE
Pod "bar" (has nvidia.com/gpu-driver-upgrade-drain.skip=true):
nvidia.com/ofed-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
nvidia.com/gpu-driver-upgrade-drain.skip!=true → FALSE (because it equals true)
Result: TRUE AND FALSE = FALSE
Pod "baz" (no labels):
nvidia.com/ofed-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
nvidia.com/gpu-driver-upgrade-drain.skip!=true → TRUE (because this label doesn't exist, so it's not equal to true)
Result: TRUE AND TRUE = TRUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, thx for the clarification so its equivalent to:
nvidia.com/ofed-driver-upgrade-drain.skip=true OR nvidia.com/gpu-driver-upgrade-drain.skip=true
pkg/upgrade/upgrade_requestor.go
Outdated
if _, exists := nm.Labels[GetUpgradeRequestorLabelKey(m.opts.MaintenanceOPRequestorID)]; !exists { | ||
nm.Labels[GetUpgradeRequestorLabelKey(m.opts.MaintenanceOPRequestorID)] = trueString | ||
} | ||
err := m.K8sClient.Patch(ctx, nm, client.MergeFrom(originalNm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you say in comments that optimistic lock should be used (rightfully so), but i dont see you using:
client.MergeFromWithOptions(originalNm, client.MergeFromWithOptimisticLock{})
in patch calls here an below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missing in my latest commit
pkg/upgrade/upgrade_requestor.go
Outdated
@@ -306,6 +324,114 @@ func (m *RequestorNodeStateManagerImpl) ProcessUpgradeRequiredNodes( | |||
|
|||
return nil | |||
} | |||
func (m *RequestorNodeStateManagerImpl) createOrUpdateNodeMaintenance(ctx context.Context, | |||
nodeState *NodeUpgradeState) error { | |||
// 1. check if nodeMaintenance obj exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these comments relevant in createOrUpdateNodeMaintenance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe need to move some of it to docs ?
docs/automatic-ofed-upgrade.md
Outdated
@@ -108,6 +108,21 @@ controller manager watchers: | |||
builder.WithPredicates(requestor.NewConditionChangedPredicate(setupLog, | |||
requestorOpts.MaintenanceOPRequestorID))). | |||
``` | |||
|
|||
The requestor mode supports a `joint-requestor` flow where multiple operators can coordinate node maintenance operations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider putting this in a sub-section of requestor
@@ -438,12 +557,12 @@ func GetRequestorOptsFromEnvs() RequestorOptions { | |||
if os.Getenv("MAINTENANCE_OPERATOR_REQUESTOR_ID") != "" { | |||
opts.MaintenanceOPRequestorID = os.Getenv("MAINTENANCE_OPERATOR_REQUESTOR_ID") | |||
} else { | |||
opts.MaintenanceOPRequestorID = "nvidia.operator.com" | |||
opts.MaintenanceOPRequestorID = DefaultNodeMaintenanceRequestorID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not really a great ux IMO to enable this feature when the two env vars are unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be checking if DefaultNodeMaintenanceRequestorID
is used before using share-requestor
flow
e6c5941
to
ee07667
Compare
@cdesiniotis, @tariq1890 please help to review this PR |
b062823
to
3eaf949
Compare
Signed-off-by: Ido Heyvi <[email protected]>
3eaf949
to
246770e
Compare
1. Each operator adds its dedicated operator label to the nodeMaintenance object | ||
2. When a nodeMaintenance object exists, additional operators append their requestorID to the spec.AdditionalRequestors list | ||
3. During `uncordon-required` completion: | ||
- Non-owning operators remove themselves from spec.AdditionalRequestors list using optimistic locking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who is the "owning operator" ?
it is the operator that managed to create the NodeMaintenance object.
which means for a given NodeMaintenance obj (name of obj is the same in the shared-requestor mode for all cooperating operators) its the operator whose RequestorID is set in spec.requestorID
who is a "non owning" operator ?
its the operator that did not create the NodeMaintenances object, which means for a given NodeMaintenance obj its the operator whose RequestorID is present in spec.AdditionalRequestors
i would define these two roles and how its determined according to the content of NodeMaintenance obj.
@@ -130,6 +130,10 @@ func (m *InplaceNodeStateManagerImpl) ProcessUncordonRequiredNodes( | |||
if nodeState.NodeMaintenance != nil { | |||
continue | |||
} | |||
// check if if node upgrade is handled by requestor mode, if so node uncordon will be performed by requestor flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove nodeState.NodeMaintenance != nil
check?
@@ -120,8 +124,18 @@ func (p ConditionChangedPredicate) Update(e event.TypedUpdateEvent[client.Object | |||
return false | |||
} | |||
|
|||
// check for matching requestor ID | |||
if newO.Spec.RequestorID != p.requestorID { | |||
// check if requestor label exists, if not ignore event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we skip this label and just check if the requestorID is in spec.requestorID or in spec.additionalRequestors list ?
one less label we need to worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also would be great to have a separate predicate for checking if the nodeMaintenance is relevant for the requestor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use controller-runtime/pkg/predicate:NewPredicateFuncs()
instead of defining a new struct + implementing all callbacks.
@@ -48,6 +48,10 @@ const ( | |||
MaintenanceOPEvictionGPU = "nvidia.com/gpu-*" | |||
// MaintenanceOPEvictionRDMA is a default filter for Network OP pods eviction | |||
MaintenanceOPEvictionRDMA = "nvidia.com/rdma*" | |||
// DefaultNodeMaintenanceRequestorID is a default requestor ID for NVIDIA operators, in nodeMaintenance object | |||
DefaultNodeMaintenanceRequestorID = "nvidia.operator.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider removing the default requestor ID, i think every consumer of this lib (an operator) should provide its own ID. less risk of defaulting to the same ID.
@@ -165,13 +179,16 @@ func SetDefaultNodeMaintenance(opts RequestorOptions, | |||
func (m *RequestorNodeStateManagerImpl) NewNodeMaintenance(nodeName string) *maintenancev1alpha1.NodeMaintenance { | |||
nm := defaultNodeMaintenance.DeepCopy() | |||
nm.Name = m.getNodeMaintenanceName(nodeName) | |||
// mark nodeMaintenance object as updated by the requestor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, i think we can drop this label ?
return nil | ||
} | ||
|
||
m.Log.V(consts.LogLevelInfo).Info("appending new requestor", nm.Spec.RequestorID, "under 'AdditionalRequestors' list") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: restructure this and use m.opts.MaintenanceOPRequestorID
m.Log.V(consts.LogLevelInfo).Info("appending new requestor under AdditionalRequestors", "requestor", m.opts.MaintenanceOPRequestorID, "nodeMaintenance", client.ObjectKeyFromObject(nm))
if nm.Labels == nil { | ||
nm.Labels = make(map[string]string) | ||
} | ||
// only set label if it doesn't exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can drop label ?
// check if object is owned by deleting requestor, if so proceed to deletion | ||
if nm.Spec.RequestorID == m.opts.MaintenanceOPRequestorID { | ||
m.Log.V(consts.LogLevelInfo).Info("deleting node maintenance", | ||
nodeState.NodeMaintenance.GetName(), nodeState.NodeMaintenance.GetNamespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: m.Log.V().Info("....", "nodeMaintenance", client.ObjectKeyFromObject(nodeState.NodeMaintenance))
nodeState.NodeMaintenance.GetName(), nodeState.NodeMaintenance.GetNamespace()) | ||
err := m.deleteNodeMaintenance(ctx, nodeState) | ||
if err != nil { | ||
m.Log.V(consts.LogLevelWarning).Error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "failed to delete NodeMaintenance, node uncordon failed", "nodeMaintenance", client.ObjectKeyFromObject(nodeState.NodeMaintenance)
}) | ||
// remove requestorID label from the object | ||
// only remove label if it exists | ||
if nm.Labels != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop label ?
patch := client.MergeFromWithOptions(originalNm, client.MergeFromWithOptimisticLock{}) | ||
err := m.K8sClient.Patch(ctx, nm, patch) | ||
if err != nil { | ||
return fmt.Errorf("failed to update nodeMaintenance. %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lets be a bit more verbose: "failed to remove requestor from additionalRequestors. failed to patch nodeMaintenance %s. %w", client.ObjectKeyFromObject(nodeState.NodeMaintenenace), err
m.Log.V(consts.LogLevelDebug).Info("deleting node maintenance", | ||
nodeState.NodeMaintenance.GetName(), nodeState.NodeMaintenance.GetNamespace()) | ||
// skip in case node undergoes uncordon by inplace flow | ||
// skip in case node undergoes uncordon by in-place flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the flow here should be:
func (m *RequestorNodeStateManagerImpl) ProcessUncordonRequiredNodes(
ctx context.Context, currentClusterState *ClusterUpgradeState) error {
m.Log.V(consts.LogLevelInfo).Info("ProcessUncordonRequiredNodes")
for _, nodeState := range currentClusterState.NodeStates[UpgradeStateUncordonRequired] {
// check if if node upgrade is handled by requestor mode, if not, its handled by in-place flow so nothing to do
if _, exists := nodeState.Node.Annotations[GetUpgradeRequestorModeAnnotationKey()]; !exists {
continue
}
if nodeState.NodeMaintenance == nil {
// only when nodeMaintenance obj is deleted, node state should be updated to 'upgrade-done'
err := m.NodeUpgradeStateProvider.ChangeNodeUpgradeState(ctx, nodeState.Node, UpgradeStateDone)
if err != nil {
m.Log.V(consts.LogLevelError).Error(
err, "Failed to change node upgrade state", "state", UpgradeStateDone)
return err
}
// remove requestor upgrade annotation
err = m.NodeUpgradeStateProvider.ChangeNodeUpgradeAnnotation(ctx,
nodeState.Node, GetUpgradeRequestorModeAnnotationKey(), "null")
if err != nil {
return fmt.Errorf("failed to remove '%s' annotation from node %s. %w", GetUpgradeRequestorModeAnnotationKey(), nodeState.Node.Name, err)
}
return nil
}
err := m.deleteOrUpdateNodeMaintenance(ctx, nodeState)
if err != nil {
m.Log.V(consts.LogLevelWarning).Error(
err, "failed to delete or update NodeMaintenance. Node uncordon failed",
"nodeMaintenance", client.ObjectKeyFromObject(nodeState.NodeMaintenance), "node", nodeState.Node.Name)
return err
}
}
return nil
}
in case of the node's requestor mode annotation does not exist, its fully handled by in place (i.e uncordon node and move to upgrade state done)
The motivation of the following code change is to utilize same node-maintenance object(s), to efficiently synchronize required node operations, by multiple requestors. It will help to reduce consecutive node operation cycles to a single operation cycle.
First requestor which creates node-maintenance obj is its owner, and will be in-charge of creating/deleting node-maintenance obj.
Second requestor will check for existing node-maintenance obj, if so, it will add itself under
spec.additionalRequestors
list.